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The Shannon theory of information has had a profound 
impact in science and technology. Shannon defined infor- 
mation in terms of the reduction of uncertainty which, in 
turn, was measured by entropy. He was concerned mainly 
with the use of information to measure the ability to transmit 
data through noisy channels, i.e., channel capacity. 

Statisticians have developed other, somewhat related, no- 
tions of information. In statistical theory, the major empha- 
sis has been on how well experimental data help to achieve 
the goals in the classical statistical problems of estimation 
and hypothesis testing. These measures serve two useful 
functions. They serve to set a standard for methods of data 
analysis, methods whose efficiencies are measured in terms 
of the proportion of the available information that is effec- 
tively used. They also serve to design efficient experiments. 



For the problem of estimation, Fisher introduced the 
Fisher Information which we now define. Suppose that it is 
desired to estimate a parameter 8 using the result of an 
experiment which yields the data X with the density f(x\Q). 
The Fisher Information for 9 corresponding to X is given by 
the matrix 

J=4(6)=£ e (YY J ) (1) 

where Y is the score function defined by 

Y=Y(X,6)= %r (2) 

If 8 is a multidimensional vector, / is a nonnegative definite 
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symmetric matrix with the additive property 

if X and Z are independent. As a consequence 

Ixx,...*m~nIxm=nJ (4) 



J xx " 



•>zz~ 



8 6 
6 8 



8 -6 
-6 8 



^ =0.286, 



j£ =0.286, 



and 



if the left subscript refers to the independent replicaton of X, 
n times. For such an experiment, it has been shown, under 
mild regularity conditions, that the Maximum Likelihood 
Estimate (MLE) 8 will be approximately normally dis- 
tributed with mean 6 and covariance matrix J~ l fn for large 
n . Moreover, the Cramer-Rao theorem states that one can- 
not expect to find a reasonable estimate that does better. 

Some implications of the above paragraph are illustrated 
by three simple examples below. 

Example 1, Mean of a Normal Distribution. 

Let X be normally distributed with mean 6 and known 
variance a 2 , and letXj,X 2 ,...,X n be a sample of n independ- 
ent observations onX. It is easy to show that J x =u~ 2 and 
that the MLE T x =X=n~ l (X,+...+X„) is normally dis- 
tributed with mean 9 and variance u 2 /n. However, the 
statistician, who fears for outliers and may wish to use a 
more robust estimator than the sample mean, may prefer to 
use T 2 , the sample median, It can be shown that T 2 is 
approximately normally distributed with mean 6 and vari- 
ance iro- 2 /2n for large n . The equation 



Jxz~ 



2Z 



\ 2 2 



(5) 



implies 7i 1 /fi 2 =2/ir=. 64 which is a natural measure of the 
efficiency of T 2 , indicating that, with T u we need only 64% 
of the data to achieve the same accuracy as with T 2 . If the 
effective waste of 36% of the data seems excessive, the 
statistician can improve on efficiency with little sacrifice of 
robustness, e.g., by using the upper and lower quartiles as 
well as the median, or by using trimmed means. 

Example 2, Experiments With Information Matrices. 

Let 6=(6i,e 2 ) r > and let X and Z be two experiments with 
information matrices 



h = 



4 3 
3 4 



and J z — 



4 

-3 



It is desired to estimate 6[ using replications of either (X,X) 
or (Z,Z) or (X,Z). Let J u be the upper left member of 7~ ! 
which measures the (asymptotic) variance of 0j, the MLE of 
6,. Then 



8 
8 



J xz 



0.125. 



This clearly indicates, that in the presence of nuisance 
parameters such as 6 2 , one may squeeze much more useful 
information out of a combination of two equally informative 
experiments than by repeating one of these two or, in this 
case, even four times. 

Example 3. Estimate Safe Dose Level in Probit Model. 

For an experiment at does level d , the probit model at- 
tributes the probability of a response to be 



p(d,e)=*[(d-|x)/cr] 



(6) 



where ©=([i,o) r , 3> is the standard normal cumulative dis- 
tribution and the "safe" dose level to be estimated is defined 
as (i,-2.87cr. If one is permitted to select a sequence of n 
dose level, d l ,d 2 ,...,d„ with which to challenge n subjects, 
the optimal choice or design, for estimating ja— 2.87o can 
be shown to assign about 23% of the doses at level 
£/=u.+ 1.57cr and the remaining 77% of the doses at level 
d = fx-L57v. 
This optimal design illustrates several points. 

1. This design is locally optimal, i.e., it requires a 
knowledge of © to provide the best estimate of a function of 
9. Superficially, it seems silly, for if we knew 9, we would 
not need to estimate it. In fact, it indicates that as data 
cumulates, one knows more about 8 and can sequentially 
use that information to provide improved experiments. 

2. In this experiment, the repeated use of one dose level 
d would provide only an estimate of the function p(d , 8) 
and would yield no other useful information about 9 or 
(j.-2.87o-. At least two dose levels are required. What is 

.somewhat surprising is that no more than two dose levels are 
required for an optimal design. A more general theorem 
states that if it is desired to estimate r functions of k parame- 
ters upon which the distribution of the data depend, then an 
optimal design can be constructed using at most k + (k-l) 
+ ...+(k —r + 1) of the available (elementary) experiments. 

3. The optimal design is not necessarily a practical one. 
Most investigators would be interested in using a variety of 
dose levels as a means of checking the basic model. Theory 
permits us to measure the loss of information inherent in the 
use of practical, but suboptimal designs, so that one can 
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decide on whether the loss is so extravagant that other alter- 
natives should be considered. 

We mention briefly that for testing hypotheses, there are 
several measures of information which are of potential use, 
depending on the type of problem. Perhaps the most useful 
measure is the Kullback-Leibler Information (KL) 



/;(e.*)-//tx|e)k*^ 



dx 



(7) 



which, measures two important aspects of the ability to use 
a sample of n observations on X with distribution /(*), to 
discriminate between the hypotheses H i f(x)=f(x\Q) and 
H 2 f(x)=f(x\<\>). 

The Kullback-Leibler Information is additive as is the 
Fisher Information but it is not symmetric since, / x (8,ef>) is 
not generally equal to /£ (<|>,0). F° r ^E e samples, it is 
possible to find tests which, for fixed type 1 error probabil- 
ity a=/ , (reject #i|#i), have the type 2 error probability 
(3=F(accept H : \H 2 ) approach at a rate determined by /*. 
We have, roughly 



P* 



-n/jf(e,« 



(8) 



Another property of KL is that for optimal sequential testing 
as the cost c , per observation, approaches zero, the expected 



costs R(B) and R(§) associated with the sequential proce- 
dure when Hi and H 2 are true, satisfies 



*(e)— c log c//J(e,<|>) 

*(<»»-<: log c//£((M) 



(9) 



This implies that if we suspect H x is true, we should select 
the experiment which maximizes I*(B,4>) and if we suspect 
that H 2 is true, we should maximize /*(<|> t 6). Here again, as 
in the estimation problem, we are in a position to improve 
the experimental design as information cumulates, and our 
belief in H { or H 2 increases. 

To return to chemical experimentation, one should point 
out that an experimental set up which yields vast amounts of 
bits of information is not very useful if the analysis of the 
data does not make efficient use of the data. To discriminate 
between two alternatives requires only one bit of effective 
information in the Shannon sense. The choice between ex- 
periments which yield 1,000 and 10,000 bits must involve 
how much effective information is readily available from 
the analysis. 

Some bibliography on the uses of information in statistics 
is contained in Chemoff (1972). 

[1] Chemoff, H., Sequential Analysis and Optimal Design SI AM mono- 
graph 8, SIAM. Philadelphia (1972). 
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